A survey of static analysis methods for identifying security vulnerabilities in software systems

نویسندگان

  • Marco Pistoia
  • Satish Chandra
  • Stephen J. Fink
  • Eran Yahav
چکیده

memory locations corresponding to EJB fields. Intuitively, fields represent the granularity by which an RBAC policy allows control of restricted data. SAVES distinguishes when a security-sensitive field f is accessed in read or write mode by a method m. An RBAC policy for a program p can be seen as a function l : R ! P (M) where R is the set of roles defined for p and M is the set of methods executed by p. An RBAC policy l is said to be location inconsistent if there exist m 2 M and q 2 R such that q has been denied access to m, but the same fields that q could have accessed through m are accessible through other methods whose access has been granted to q. A location inconsistency indicates that the intent of the security policy is unclear. Centonze et al. have proved that a method-based RBAC policy l has an equivalent location-based RBAC policy if and only if l is location consistent. Given a program p with an RBAC policy l, SAVES performs a field-sensitive and context-, flow-, and path-insensitive interprocedural mod-ref analysis to determine the sets of fields read and written by each method, and detects potential location inconsistencies for l. If no location inconsistency is detected, SAVES can report the location-based RBAC policy equivalent to l. Experimental results reported in Reference 14 show that the analysis is effective for a number of Java EE applications. INFORMATION FLOW In this section we discuss techniques that identify information-flow vulnerabilities in software systems and focus on integrity and confidentiality as the two main types of vulnerabilities. We discuss the research work to date in this area and describe in more detail some recent contributions. The data manipulated by a program can be tagged with security levels, 40 which naturally assume the structure of a partially ordered set. Under certain conditions, this partially ordered set is a lattice. 41,42 In the simplest example, this lattice only contains two elements, indicated with high and low. Given a program, the principle of non-interference dictates that low-security behavior of the program not be affected by any high-security data. 43 Assuming that high means confidential and low means public, then verifying that no information ever flows from higher to lower security levels (unless that information has previously been declassified) is equivalent to verifying confidentiality. Conversely, if high means untrusted and low means trusted, then verifying that no information ever flows from higher to lower security levels (unless that information has previously been endorsed) is equivalent to verifying integrity. Vulnerabilities such as those caused by nonvalidated input and injection flaws constitute integrity violations. Globally, declassification and endorsement are also known as downgrading because they allow high-level security information to be used in low-level security contexts. Integrity The data that originate from an untrusted source is referred to as tainted. 44 Tainted data and the variables that hold or reference it can be maliciously used for overwrite attacks, 44 which may consist, for example, of overwriting the name of a file or jump IBM SYSTEMS JOURNAL, VOL 46, NO 2, 2007 PISTOIA ET AL. 275 address. Sometimes, however, it is necessary to use a tainted variable in a trusted environment when restricted resources are accessed. In such cases, the data can be endorsed by performing sanity checks on it before using it in restricted operations. 45 Sanity checks are usually domain or component specific. For example, the SQL query in Figure 5 needs to perform a security-sensitive operation based on input coming from a user. If the program did not validate user inputs, a malicious client could call submit Query passing " ' ' OR '1' 1⁄4 '1' " as the value of the password parameter, causing the password check in the SQL query to become password1⁄4 " ' ' OR '1' 1⁄4 '1' ", which always succeeds. In SBAC systems, integrity issues commonly arise in the context of privilege-asserting code. The requirement for integrity establishes that no value defined in untrusted code should ever be used inside privilege-asserting code unless that value has been previously endorsed. In an SBAC system, a tainted variable is not necessarily a security problem. It may constitute a security problem if it is also a privileged variable, meaning that it is used inside privilegeasserting code. 3 Even a privileged tainted variable is not necessarily a security problem. In fact, it is appropriate to distinguish two types of privileged tainted variables: if a privileged variable is used to access a restricted resource, that variable is called malicious; otherwise, it is called benign. Because authorization checks are not performed beyond the stack frame invoking doPrivileged in Java or Assert in CLR, an untrusted client application could exploit a malicious variable to have the privilegeasserting code access arbitrary restricted resources on its behalf. Consider, for example, the TaintedLibrary class shown in Figure 6. Both host and port are tainted variables because an untrusted client can arbitrarily set them. The fact that they are used inside privilegeasserting code to open a socket makes them malicious and constitutes a potential security risk. An untrusted client, with no SocketPermission, can Figure 6 Source code of TaintedLibrary Java class import java.net.*; import java.security.*; public class TaintedLibrary { public static Socket createSocket (final String host, final int port, final String userName) throws Exception { Socket s; PrivOp op = new PrivOp(host, port, userName); try { s = (Socket) AccessController.doPrivileged(op); } catch (PrivilegedActionException e) { throw e.getException(); } return s; } } class PrivOp implements PrivilegedExceptionAction { private String host, userName; int port; PrivOp(String host, int port, String userName) { this.host = host; this.port = port; this.userName = userName; } public Object run() throws Exception { System.out.println("User: " + userName + "; Host: " + host + "; Port: " + port); return new Socket(host, port); } } PISTOIA ET AL. IBM SYSTEMS JOURNAL, VOL 46, NO 2, 2007 276 invoke createSocket on the trusted library and have the library open an arbitrary socket connection on its behalf. Conversely, variable userName, though tainted and privileged, is benign because its value is not used to access a restricted resource. In Figure 2, variable logFileName is not tainted because its value cannot be set by a client application. In RBAC systems, integrity violations can arise due to incorrectly specified principal-delegation policies. A principal-delegation policy overwrites the roles granted to the executing user with the roles specified by the policy itself. From that point on, the execution of all the cascading calls will be performed as if the user had been granted the roles specified by the principal-delegation policy. A principal-delegation policy is often used to elevate, in special circumstances, the authority of the users executing the program, without making it necessary to grant those users roles that would allow them to execute unintended operations. For example, in the scenario of Figure 4, the principal-delegation policy associated with the component of methods m 1 and m 4 assigns user bob the role of Manager, which is required to invoke m 3 . This principal-delegation policy has made it unnecessary to grant user bob the role of Manager, which could have been misused. The integrity requirement establishes that no value defined by the user be used after the user’s authority has been elevated unless that value has been previously endorsed. Confidentiality Confidentiality issues in SBAC systems also arise in the context of privilege-asserting code. The requirement for confidentiality establishes that a value flowing out of a privilege-asserting block of code b should remain confined inside the component of b unless a check has been performed to verify that the value can be safely released. This requirement can also be used when deciding whether it is appropriate to make a block of code privilege asserting. For example, in the Library, class shown in Figure 1, it is appropriate to make the FileOutputStream constructor call privilege asserting (as done later in the code of Figure 2), not only because there is no integrity break but also because the constructed FileOutputStream object remains confined inside the Library class itself. Conversely, the call to the Socket constructor should not be made part of privilege-asserting code. Figure 6 shows that if the code to the Socket constructor were made privilege asserting, there would be not only an integrity violation but also a confidentiality violation, because the constructed Socket object is released to the potentially untrusted client that invoked createSocket. In RBAC systems, confidentiality violations can arise due to incorrectly specified principal-delegation policies, as is the case for integrity violations. When a principal-delegation policy elevates the roles of a user, all the data defined inside the code executed under that policy should remain confined inside that code. For example, in the scenario of Figure 4, any value defined or computed in the component of m 3 and m 6 has been originated under the authority of the role Manager. A flow of information which makes that value accessible to the component of m 1 and m 4 might violate the confidentiality requirement by allowing users with the role of Employee to access information intended only for users in the role of Manager. Analysis techniques for information flow While the accurate detection of information flow is undecidable, 46 static analysis can be used to overapproximate information flows in a program in order to ensure information-flow security. In this section, we present a survey of algorithms for checking information-flow security. The basic idea behind using static analysis for detecting information flow is to statically check that flow of information between variables in a program is consistent with the security labeling of variables. Each variable is labeled with a certain security level. If a variable x is used to derive, or influence, the value of another variable y, there is potential information flow from x to y. The flow is permissible under a security policy only if the policy allows the security level of x to flow to the security level of y. Formally, let (S, ) be a lattice of security levels. For security levels a and b, a b means that it is allowed for information of level a to flow into level b. If the lattice is modeling confidentiality, then a is no more secret than b; if the lattice is modeling integrity, then a is at least as trusted as b. We denote the security level of a variable x by dom(x). There exists an explicit flow from x to y if the value of x is assigned to y in the program, as in y :1⁄4 x. There is an implicit flow from x to y if the value of x is used to evaluate the outcome of a conditional, which then controls an assignment to y, as in if (x . 0) then IBM SYSTEMS JOURNAL, VOL 46, NO 2, 2007 PISTOIA ET AL. 277 y :1⁄4 1 else y :1⁄4 0 endif. We denote an explicit or implicit flow from x to y as x ) y. Denning and Denning’s certification of programs for secure information flow 40 uses the following proposition: A program respects the security policy implied by the lattice (S, ) when, for any flow x ) y, it must be the case that dom(x) dom(y). To enforce a program’s information-flow security, this property, which defines a sufficient but not necessary condition, must be verified for each flow in the program. Goguen and Meseguer 43 have given a more general notion of information-flow security based on noninterference. Informally, the non-interference principle says that an observer must not be able to see variation in ‘‘low-security’’ outputs that is derived from variation in ‘‘high-security’’ inputs (so the observer cannot make any inference on highsecurity information). Suppose a computation undergoes the following transition between input and output states: (H 1 , L 1 ) ! (H 2 , L 2 ), where H 1 and H 2 are high-security components and L 1 and L 2 are low-security components of the states. For this computation to be secure, it must be the case that for any other value of high-security component, the low-security output does not change: ðH 0 1; L1Þ ! ðH 0 2; L2Þ: Remember that the interpretation of the terms ‘‘high security’’ and ‘‘low security’’ depends on the problem being solved: for confidentiality, higher and lower security mean more and less secret; for integrity, they mean less and more trusted, respectively. For example, let h be a secret variable and l be a public variable. Then the program input h; if (h . 0) then l :1⁄4 1 else l :1⁄4 0 endif; output l violates non-interference, because different initial values of high input h can result in different final values of the low output l. Formally, let w S3S be an interference relation on security levels. If a w b, the security level a is allowed to interfere with the security level b, in the sense that it can impact observable values in level b. Generally, we are interested in the complement of this relation, a 6 w b; which prohibits a from interfering with b. Note that w must be reflexive, but in general it need not be transitive. While Goguen and Meseguer presented non-interference in a more abstract setting of ‘‘actions,’’ our presentation here is limited to program statements. We label each input or output statement x by its security level dom(x) (generalizing dom to apply to statements). The security level of an input statement is the security level of the input value being provided to the computation. The security level of an output statement is the security level of the variable being made visible external to the computation. Let run be the state update function State 3 Statement ! State. Let purge be a function that, given a trace of statements a, removes from it all such statements x whose security level must not interfere with a given security level t; it is defined as follows: purge(x a, t) :1⁄4 x purge(a, t) if dom(x) w t, and purge(x a, t) :1⁄4 purge(a, t) otherwise. Let s 0 be the initial state of the program. Then, the security criterion can be stated as follows: A program respects the security policy implied by a non-interference relation 6 w if for any sequence of statements a in an execution ending in a statement z: outputðrunðs0; aÞ; zÞ 1⁄4 outputðrunðs0; purgeða; domðzÞÞÞ; zÞ That is, the output produced at statement z must be identical even if all such previous statements have been purged whose security level must not interfere with the security level of z. In the previous example, on any run in which input h is purged (and the default value of h, assumed 0, is in effect), the output is 0 for l; whereas a nonpurged run with input value of 1 or higher would produce the output of 1 for l. For sake of contrast, consider a slightly modified example: input h; if (h . 0) then l :1⁄4 1 else l :1⁄4 0 endif; l :1⁄4 2; output l. In this example, even if the statement input h is purged from any run, the output for l is 2, which is the same as for any nonpurged run, and so the conditions for noninterference are satisfied. Non-interference is a more general criterion than our first criterion of secure information flow, in that it only constrains the projection of outputs produced from actual statement sequences; it does not constrain implicit or explicit flow at each statement. Note that Denning and Denning’s criterion would have rejected the modified example above because an implicit flow violating the security-lattice rule does exist. Volpano, et al. 47 have shown a typePISTOIA ET AL. IBM SYSTEMS JOURNAL, VOL 46, NO 2, 2007 278 based algorithm that certifies implicit and explicit flows similarly to the first criterion and also ensures non-interference. Non-interference is traditionally the technical criterion used for proving correctness of security analysis algorithms or type systems. However, it is also harder to check non-interference directly. Secure information flow is important in the context of Web applications. A number of approaches for reasoning about flow of information in systems with mutual distrust have been proposed. For example, Myers and Liskov 48 use static analysis for certifying information control flow and avoiding costly runtime checks. In Java and CLR, information flow issues are particularly relevant with privilege-asserting code. Privilege-asserting code has historic roots in the 1970s. The Digital Equipment Corporation (DEC) Virtual Address eXtension/Virtual Memory System (VAX/VMS) operating system had a feature similar to the doPrivileged method in Java 2 and the Assert method in CLR. The VAX/VMS feature was called privileged images. Privileged images were similar to UNIX setuid programs, 15 except that privileged images ran in the same process as all the user’s other unprivileged programs. Thus, they were considerably easier to attack than UNIX setuid programs because they lacked the usual separate process and separate address-space protections. One example of an attack on privileged images is demonstrated in a paper by Koegel, Koegel, Li, and Miruke. 49 The notion of tainted variables as vehicles for integrity violations became known with the Perl language. In Perl, using the T option allows detecting tainted variables. 50 Shankar, Talwar, Foster, and Wagner present a tainted-variable analysis for CQual using constraint graphs. 51 To find format string bugs, CQual uses a type-qualifier system with two qualifiers: tainted and untainted. 52 The types of values that can be controlled by an untrusted adversary are qualified as being tainted, and the rest of the variables are qualified as untainted. A constraint graph is constructed for a CQual program. If there is a path from a tainted node to an untainted node in the graph, an error is flagged. Newsome and Song propose a dynamic taintedvariable analysis that catches errors by monitoring tainted variables at runtime. 44 Data originating or arithmetically derived from untrusted sources, such as the network, are marked as tainted. Tainted variables are tracked at runtime, and when they are used in a dangerous way, an attack is detected. Ashcraft and Engler 45 also use tainted-variable analysis to detect software attacks due to tainted variables. Their approach provides user-defined sanity checks to untaint potentially tainted variables. Pistoia 26 proposes an algorithm based on program slicing to automatically discover malicious tainted variables in a library. His approach can be used to decide whether a portion of library code should be made privileged or not. Hammer, Krinke, and Snelting’s algorithm Snelting et al. 53 make the observation that program dependence graphs (PDGs) and non-interference are related in the following manner. Consider two statements s 1 and s 2 . If dom(s 1 ) 6 w dom(s 2 ), then, in a security-correct program, it must be the case that s 1 = 2 backslice(s 2 ). Here, backslice is the function that maps each statement s to its static backward slice, consisting of all the (transitive) predecessors of s along controland data-dependence edges in the PDG. Based on this observation, Hammer et al. 54 have presented an algorithm that checks for noninterference: for any output statement s, it must be the case that backslice(s) contains only statements that have a lower security label than s. Hammer et al. also refine slices with path conditions to get higher accuracy, but we elide the details here. Note that even PDG-based computation, as in the above technique, is only an approximation to the ideal of non-interference. We assume that the reader is familiar with PDGs and slicing, as these are standard concepts in program analysis. Here, we present just an example to illustrate the idea in the context of information-flow security. Consider the program shown in Figure 7. (Figure 8 shows the PDG for this program.) Edges that are in the backward slice from the output statement are shown in red. It is clear that the backward slice of the output statement includes the higher-security input statements, which must not interfere with the output statement (assume that w coincides with in the security lattice shown in Figure 7). Note also, that a more sophisticated program verifier may be able to reason that the outcome of the branch at line 10 is always false. An ideal checker for non-interference would not report a IBM SYSTEMS JOURNAL, VOL 46, NO 2, 2007 PISTOIA ET AL. 279 security violation. It should be noted that PDG-based algorithms, such as the one above, have not been shown to scale to large applications, of the size of several hundred-thousand lines of code. Flowsensitive approaches, such as Denning and Denning’s algorithm and several type-system-based algorithms, can scale better. The latter also enjoy the advantage of compositional analysis, which means that parts of programs can be analyzed in isolation, which is generally hard to do in PDG-based analysis. Dealing with heap data In this section, we describe an algorithm for performing non-interference analysis in the presence of heap-allocated data structures, which are very common in Java and other object-oriented languages. The analysis of heap-based objects is an entire area of research in itself, and even a brief survey is beyond the scope of this article; a recent research paper 55 gives a good overview of the current state of the art. In the present paper, we focus on a recent algorithm by Livshits and Lam 56 that is engineered to work well for tainted-variable analysis of large Java applications. Livshits and Lam’s analysis requires prior computation of a specific heap analysis called flowinsensitive points-to analysis. This analysis computes a ‘‘may point to’’ relation over a program, where pointsTo(o 1 .f, o 2 ) means that the field f of the object named o 1 might refer to the object named o 2 in some execution of the program. A points-to relation is also computed for local variables: pointsTo(t, o) means that the local variable t might refer to the object named o. The relation pointsTo(t.f, o) holds if there exists an o such that pointsTo(t, o) and pointsTo(o.f, o). The pointsTo relation is the same for the entire program, ignoring the control flow of the program. (By contrast, the PDG-based algorithm of Hammer et al. handles heap objects in a flow-sensitive manner, albeit at much higher cost.) We refer the reader to a paper by Whaley and Lam 57 that describes the details of the heap analysis used by Livshits and Lam. Tainted-variable analysis is an integrity problem in which we are interested as to whether less-trusted data obtained from the user might influence other data that the system trusts. Clearly, to do this analysis, one needs to identify sources and sinks of possibly tainted data. For Java, this amounts to identifying methods that originate a tainted value and methods that use a possibly tainted value. The Livshits and Lam algorithm gets this information Figure 7 Example illustrating information-flow algorithms Security labeling: h1, topSecretFile: TOPSEC; h2, confidentialFile: CONF; 11: LO1; 12: LO2; m, publicFile; PUB;

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Security Testing: A Survey

Identifying vulnerabilities and ensuring security functionality by security testing is a widely applied measure to evaluate and improve the security of software. Due to the openness of modern software-based systems, applying appropriate security testing techniques is of growing importance and essential to perform effective and efficient security testing. Therefore, an overview of actual securit...

متن کامل

Compose, reliability and validity of survey of histrionic women's couple vulnerabilities

  The purpose of this study is to compose and reliability of a survey about histrionic women's mutual damages in their marital relationship. The method that was used for this survey was a composition of exploratory and combinatory which was done in two steps. The first step was a qualitative factors discovery analysis one which included 17 semi-structured interviews and review of scientific ...

متن کامل

Mining the Categorized Software Repositories to Improve the Analysis of Security Vulnerabilities

Security has become the Achilles’ heel of most modern software systems. Techniques ranging from the manual inspection to automated static and dynamic analyses are commonly employed to identify security vulnerabilities prior to the release of the software. However, these techniques are time consuming and cannot keep up with the complexity of ever-growing software repositories (e.g., Google Play ...

متن کامل

Towards Assisted Remediation of Security Vulnerabilities

Security vulnerabilities are still prevalent in systems despite the existence of their countermeasures for several decades. In order to detect the security vulnerabilities missed by developers, complex solutions are undertaken like static analysis, often after the development phase and with a loss of context. Although vulnerabilities are found, there is also an absence of systematic protection ...

متن کامل

Identifying Information Security Risk Components in Military Hospitals in Iran

Background and Aim: Information systems are always at risk of information theft, information change, and interruptions in service delivery. Therefore, the present study was conducted to develop a model for identifying information security risk in military hospitals in Iran. Methods: This study was a qualitative content analysis conducted in military hospitals in Iran in 2019. The sample consist...

متن کامل

Collaborative Security Code-Review Towards Aiding Developers Ensure Software-Security

Humans make mistakes, and software programmers are no exception. Software vulnerabilities are discovered everyday; close to 8,000 vulnerabilities were reported in 2014, and almost 2,500 were reported in the first four months of 2015 [9]. Microsoft Security Response Centre defines software vulnerabilities as a security exposure that results from a product weakness that the product developer did ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IBM Systems Journal

دوره 46  شماره 

صفحات  -

تاریخ انتشار 2007